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ABSTRACT 

This investigation examined the value of using a multimethod- 
multisource approach to assess high-technology training systems. 
This research strategy was utilized to provide empirical 
information regarding the Reserve Component Virtual Training 
Program's (RCVTP's) instructional effectiveness. Observers 
collected data from nine units; fourteen RCVTP instructors 
completed standard rating forms regarding the performance of 38 
armored force units; and 280 training participants completed 
Likert -scale items regarding their training experience. Data 
from the different methods indicated that the units further 
developed their collective tactical skills across the training 
period. The advantages and problems with using a multimethod- 
multisource research strategy for assessing high-technology 
training systems were then discussed. 
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Multimethod Multisource Approach for Assessing High-Technology 

Training Systems 

This paper addresses the problem of evaluating the 
effectiveness of a high-technology based training system in a 
non- controlled context without the possibility of obtaining 
either baseline or transfer measures of performance. Such non- 
traditional evaluations may become more prevalent for the 
military traii^ing community as the resources to conduct 
controlled transfer evaluations become more scarce. The high 
costs in time, money, and personnel associated with traditional 
transfer evaluations are well recognized by trainers and 
researchers (Blaiwes & Regan, 1986) . 

Evaluation Issues 

Selecting Appropraite Data collection Methods^ Evaluating a 
high-technology training program in a non- controlled context 
without the possibility of obtaining either baseline or transfer 
measures of performance presents several interesting challenges. 
As is the case with any evaluation, the primary challenge 
involves determining the most appropriate method (s) for 
collecting the data. This issue is even more pronounced for 
high-technology training systems (such as the Simulation 
Networking or SIMNET systemM which have not been equipped with 
any device for providing quantitative performance data. 
Researchers must then collect data through observations (Gound & 

^ See Garvey & Radgowski, 1988 for a detailed description of 
this system. 
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Schwab, 1988), questionnaires (Brown, et al . , 1988), or 
instructor ratings (Bessemer, 1991; Shlechter, Bessemer, & 

Kolosh, 1991) . 

Each method is potentially problematic. Observational 
methods are labor intensive, a situation which limits the sample 
size. These methods may also be contaminated by the obervers' 
subjectivity and corresponding problems with reliability \P.A. 
Adler & Adler, 1994) . Questionnaires can be tainted by the 
students' inability to report, accurately, the effects of the 
training device on their performance. The accuracies of self- 
reports have been hotly debated by psychologists (e.g., Burnside, 
1982; Herrmann, 1982). Instructor ratings may be contaminated by 
the expectations or biases held by the instructors (Cook & 
Campbell, 1979) . These instrumentation problems are more critical 
in non -control led or quasi -experimental designs, which are 
susceptible to extraneous variables (Cook & Campbell) . 

One approach for dealing with such extraneous variables is 
found in Bessemer's (1991) quasi -experimental evaluation of 
SIMNET. This evaluation consisted of obtaining instructor ratings 
for 1705 Armor Officer Basic (AOB) students of which 1059 did not 
receive SIMNET training and 646 did. Multiple regression 
techniques were used to help remove the effects associated with 
extraneous variables, e.g., instructor biases in the ratings. 

There were two main difficulties with implementing 
Bessemer's evaluation approach. One, his evaluation involved a 
large sample size, which might be difficult to obtain in a future 
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of dwindling resources for research and development. Two, 
statistical adjustments can never fully substitute for 
experimental controls. 

A multimethod evaluation approach has also been suggested as 
a technique for circumventing the cited limitation with 
naturalistic evaluations (e.g.. Denizen, 1978 as cited in Patton, 
1987; Denizen & Lincoln, 1994; Cook & Campbell, 1979). This 
approach is expected to provide a more in-depth understanding of 
the phenomenon under study than could be provided by the use of 
any single evaluation methodology (Denizen & Lincoln) . Also, 
areas of agreement between methods would boost confidence in the 
data's internal and construct validity (Cook & Campbell) . And, 
Scandal, Money, Grainier, & Hall (1983) have noted that self- 
report methodologies may serve to strengthen and refine data from 
other, more generalized approaches toward predicting task 
performance . 

Denizen (1978) has also suggested that naturalistic 
evaluations should sample data from and/or by a variety of 
sources. Each source could provide a different perspective 
regarding the training situation. Observers who are not part of 
the training process may view a subject's performance differently 
from an instructor who is part of the process. Perhaps then, a 
multimethod-multi source approach should be employed when 
conducting a quasi -experimental evaluation of high-technology 
training systems. 

.qamplina Adequate Criteri on Measures.. A multimethod- 
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multisource design can be useful for helping evaluators to meet 
another long-standing challenge- -sampling adequate criterion 

measures. As noted by Gagne (1954) : 

"The ' criterion problem' has been with us (researchers of 
training devices and simulate ’“s) for a long time." (p. 95) 
More recently, Shute and Regian (1993) have noted that 
sampling adequate criterion measures has been a problem which has 
plagued evaluations of high-technology training systems, 
especially those systems designed to help students to become 
proficient in performing complex tasks. This problem has been 
manifested, for example, in nearly all previous evaluations of 
SIMNET's effectiveness (Kraemer & Bessemer, 1987; Bessemer, 1991; 
Brown, Paschal, & Southard, 1988; TEXCOM Combined Arms Test 
Center, 1990; Shlechter, Bessemer, & Kolosh, 1991) . These studies 
have focused on measuring differences in the SIMNET- trained and 
control units' abilities to perform certain standard Army 
training and evaluation program tasks. 

Cognitive psychologists, however, have recently argued that 
expertise involves more than successfully performing a set of 
tasks (Collins, Brown, & Newman, 1989; Kraiger, Ford, & Salas, 
1993; Patrick, 1992). Experts are better able to perform the 
same task more quickly than the less advanced students (Kraiger, 
et al) . That is, expertise involves "automatizing" the important 
skills associated with task performance. Expertise also involves 
th© ability to attend to task cues without too much reliance on 
instructor prompting and to articulate the reasons for one's 
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actions (Collins 6t al, 1989; Patrick) . 

Mnltimethod-Multisourcft Evalua tions.^ Multimethod- 
multisource evaluations of training systems are not a novel idea. 
Finley, Rhinehelder, Thompson, and Sullivan (1972) , for example, 
used both experimental and field evaluation designs for 
evaluating the training effectiveness of a naval air traffic 
control center training device. And, Brehmer & Dorner (1993) 
have suggested that investigations of computer- simulated 
microworlds include experimentation and case studies. 

A search of several bibliographic data bases and literature 
reviews (Adams, 1978; Hays & Singer, 1989; Orlansky, Dahlman, 
Hammon, Metzko, Taylor, & Youngblut, 1994; van Berkum & DeJong, 
1991) failed to locate very many naturalistic evaluations of 
high-technology training systems which employed a multimethod- 
multisource approach. Adams has noted that evaluations of flight 
simulators have mainly consisted of controlled transfer studies 
or equipment ratings by subject-matter experts (SMEs) . Adams has 
argued that both methods are highly flawed. 

Purtheirmore, the more recent literature on training 
simulations has tended to be; (a) analytic estimates of the 
system's training capabilities (Burnside, 1990); (b) controlled 

transfer studies (e.g., McAnulty, 1992; Swezey, Perez, & Allen, 
1991) ; (c) field studies involving multimethods but not 

multisources (e.,g., Lesgod, Lajoie, Bunzo, & Eggan, 1988); (d) 

equipment ratings by SMEs (Harrison, Acchione-Noel , Butler, 
Nantze, & Walker, 1992); (e) cost estimates (see Orlansky et al. 
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1994); and (f) investigations of the system's fidelity ( see Hays 
& Singer, 1988) . Lesgold et al, for example, used both a "think- 
aloud" protocol ("What are you now thinking?") and simulated 
recall methodology ("What did you think?") for their field 
investigation of the Sherlock troubleshooting systems. 

Overview of cur rent study 

ohjeetive . This study was thus designed to illustrate the 
advantages and limitations with using a multimethod-multisource 
approach for conducting a naturalistic evaluation of a high- 
technology based training program. The training program evaluated 
was the Reserve Component Virtual Training Program (RCVTP) , which 
has been implemented at Fort Knox, KY. 

Rrief Deecription of the RCVTP.. The RCVTP has been 
developed through congressional funding to improve the training 
of Army National Guard (ARNG) units. This funding was made 
available because ARNG units, who are becoming an increasingly 
important element of post cold-war combat, have limited training 
resources and time with only 39 days allocated for training per 
year. 

This program's primary goal involved having ARNG units 
experience National Training Center (NTC) -like missions in a 
time- compressed manner. Providing NTC-like training in a time- 
compressed manner involved utilizing the available high 
technology training simulation systems at Fort Knox, KY. The 
primary simulation utilized by the RCVTP was the SIMNET system. 

Providing NTC-like training for ARNG armored units also 
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entailed developing a structured set of SIMNET training exercises 
(training tables) . This structure consisted of having units 
perform actions (critical subtasks) associated with specific 
training objectives and. cues. Examples of critical subtasks 
included: (a) reaching the starting point on time; (b) executing 

fires when the enemy crosses the trigger line; and (c) conducting 
displacement as directed. 

Approximately one hundred such training tables were created 
for this training program with each training table designed to be 
conducted in two hours. One half-hour was spent by units on 
preparing for the mission, 1 hour on executing the mission, and 
another 1/2 hour on participating in an after-action review (AAR) 
of the exercise. (See Shlechter, Bessemer, Nesselroade, & 
Anthony, 1995) for more detailed information regarding this 
training program.) 

The RCVTP training managers felt that conducting an 
evaluation with experimental controls would encroach upon their 
training program. That is, they wanted the training conditions 
for the RCVTP 's formative evaluation to be very similar to the 
training conditions for the fully implemented program. 

MPthnds used. The methods and sources used to assess the 
RCVTP were based on the previous SIMNET evaluations. These 
different assessments consisted of observations by evaluators, 
instructors' (RCVTP observer/controllers' --0/Cs' ) judgments of 
performance, and participants' questionnaire responses. These 
methods are further delineated in the sections dealing with 
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Assessment A (observations by researchers), Assessment B (0/Cs* 
judgments) and Assessment C (questionnaire responses) . 

Participants sampled. Most of the participants for this 
evaluation were part of the developmenta] trials phase of the 
RCVTP. This phase took place during the Winter and Spring of 
1994. All units volunteered to participate in this evaluation. 

Assessment A: The Observers' Reports 

This assessment was conducted by researchers from the U.S. 
Army Research Institute for the Behavioral and Social Sciences 
(ARI) , who were independent of both the instructional design and 
training process. To save costs, these observers sampled targeted 
training units. 

Method 

Participants . Nine units were obseirved. These units 
consisted of three armor companies, two armor company teams, and 
two armor platoons. All were ARNG units with the exception of 
one company team and one armor company who were active units 
stationed at Fort Knox, KY. 

Instruments . The RCVTP Training Observation Form was 
created to measure the different aspects of tactical skills 
expertise. This instrument allowed the observers to collect data 
on: (a' time taken for preparations; (b) 0/Cs' and units' actions 

for the training table; (c) time taken to complete the training 
table; (d) problems encountered during a mission, such as radios 
not working or the unit's failure to send a report; and (e) coded 
entries about the AARs. There was also room in this form for 
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making comments about the mission. 

The observers also completed five Likert -scaled items 
dealing with the mission and 22 items dealing with the AAR. A 1-5 
scale was used for these items--l as never and 5 as always--which 
was based on a scale developed by Kraemer and Wong (1992) . These 
items for the exercises were based on the critical subtasks 
associated with the RCVTP. 

Training the observers . The observers -two research 
psychologists and a graduate student intern with the Armored 
Forces Research Unit at Fort Knox, KY- -training consisted of 
systematically going over a detailed set of instructions. They 
were also directed to view videotapes of an AAR conducted when 
the RCVTP training tables were being piloted. Also, two of the 
observers read the Army manual on platoon tactics (FM 17-15: U.S. 
Department of the Army, 1987) . The third observer was quite 
knowledgeable with regards to platoon and company tactics. 

Data collection procedure. Data were collected for the 
sampled units by three observers. Because of constraints imposed 
by the training trial procedures, these observers were rarely 
able to record data for the same training tables. The sequence 
of training tables viewed by each observer varied from unit to 
unit. Observer A, for example, viewed the first three training 
tables for Unit 1 and the last three training tables for Unit 2. 
This variation helped to control for possible data biases due ‘’o 
systematic observer differences. 

A reliability check was conducted as observers were able to 




Multimethod-Multispurce Evaluation 12 

follow the same training tables for one unit. Few discrepancies 
were found among observers with regards to the performance data. 

Scoring procedures . Two judges scored the observational 
reports based on a predetermined scoring scheme. The few 
discrepancies found in this scoring were resolved by a discussion 
between the judges. 

Results and Discussion 

Data for training table perfo rmance. Kendall (1975) T 
rank- order correlations were computed to determine the existence 
of any significant trends in the units' exercise time, errors, 
and coaching scores across successive exercises in their RCVTP 
training. Alpha-level for the statistical tests done in this 
evaluation was set at .10. Because of the limited sample size, 
these analyses also involved combining the data across platoons 
and companies and across active and ARNG units. And, data for 
one training table were not recorded (see Table 1) . 

Table 1. 

Means and Standard Deviations of the Units' Time in Min, Error 
Rates, and Coaching Scores for Successive Training Tables 



Training 

Tables 




Time 
in Min 


Error 

Rate 




Coaching 

Score 


n 




sn 


n 


sn 


U 


sn 


First 


9 


85.22 


30.40 


12 . 89 


4.81 


8.44 


S .21 


Second 


9 


52.00 


23.04 


6 . 11 


2 . 93 


4.22 


3.03 


Third 


8* 


40.88 


10.51 


5.00 


2.44 


3.75 


2 . 81 


Fourth 


8 


41.00 


12.59 


6.38 


4.43 


5.25 


3 .99 


Fifth 


7 


37.57 


20.33 


5.14 


2.79 


3.42 


2.14 


Sixth 


3 


32.00 


10.44 


1.67 


.58 


2.00 


1.73 



• Data missing for one unit. 
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AS shown in Table 1 , these units, typically, took less time, 
made fewer errors, and needed less coaching as their training 
progressed. Significant negative trends associated with these 
measures were confiirmed by statistical analyses (see Table 2) . 
These trends were not a function of the units' being less likely 
to finish their later training tables. Units were found to be 
more likely to complete their fifth training table rather than 
their first training table. The RCVTP thus seemingly helped these 
units to develop their collective tactical skills. 



Table 2 

Weighted Mean t -V alues and Tests of Significance Including All 
RCVTP Tables 



Variables 









OL 



Times 

Errors 

Coaching 



- . 574 

- .340 

- . 206 



.274 
. 174 
. 186 



8 

8 

8 



6.31* ** 

5.86** 

3.32** 



* negative number indicates a decreasing trend. 

** P < .05 

Questions exist, however, about the generalizeability of the 
RCVTP ' s training effectiveness. The previously cited improvement 
trends could have reflected units' becoming more adept at using 
the SIMNET system. The observers did feel that units were more 
disoriented in their first trainin.^ cable than in the second 



training table, with reported means of 2.89 and 2.00 for training 
tables 1 and 2, respectively. This difference was statistically 
significant, t(8) = 2.10, p < .10. The observers' comments also 
indicated that most of the coaching was done vis-a-vis the units' 
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problems with: (a) orientation in the SIMNET terrain, (b) use of 

the SIMNET radios, and (c) formations. 

for AARs. These data were problematic. For one 

thing, poor reliability was found among observers. Also, 
observers had trouble following and recording the content of 
these discussions. Finally, fewer units were sampled for these 
data than for the performance data, because AARs were not always 
given after each tnission. 

The AAR data did indicate that SIMNET- related problems were 
not an issue for these units. Fewer than 1% of their reported 
comments in any given AAR dealt with SIMNET. Also, observers 
indicated that the participants rarely asked questions about or 
made comments about using SIMNET, with an overall mean rating of 

1.30 for this AAR sumn'iary item. 

of this assessment. This assessment did provide a 

picture of the RCVTP's effectiveness. However, additional 
evidence based on a larger sample is needed to confirm this 
assessment's findings. This evidence also needs to be based on 
performance judgments made by subject-matter experts. Such 
judgments are described in the next section. Assessment B. 

Assessment B: The 0/Cs' Judgments 
Fourteen 0/Cs provided these data. An 0/C typically assessed 
the performance of four units. Occasionally, two or more 0/Cs 
were identified as working together on an assessment. 

Method 

pants. Data were collected on 38 armored force 
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units. These units consisted of 17 armor platoons. 10 armor 
companies, 5 scout platoons, and 6 mechanized infantry platoons. 
Only 5 active units (4 armor companies and 1 scout platoon) were 
included in this sample. This sample also included the units who 

W01T© Scin\pl6d in Assssstnsnt A. 

pafa roliertion procedure^ For each training table 

completed by these units, the 0/Cs indicated in structured rating 
forms those subtasks in which units needed either to "train to 
sustain" or "train to improve," representing satisfactory or 

unsatisfactory performance, respectively. 

.qrnririg procedures and m easures . Two independent judges 
identified subtasks which these units performed at least twice. 
0/Cs ' ratings were then categorized into measures indicating 
changes in the^a units' subtasks proficiency as their training 
progressed. One set of measures dealt with subtask proficiency 
changes associated with units' initial and final performance of a 
subtask. These measures were: (a) improve/sustain; (b) 

sustain/sustain; (c) improve /improve; and (d) sustain/improve . 

Also tabulated was the total number of "train to improve" and 
"train to sustain" ratings for these units' initial and final 
performance of the different subtasks . 

Another set of measures involved examining these units 
subtask proficiency across training tables. These measures 
consisted of counting, separately, the number of ratings for each 
training table which dealt with units': (a) first performance of 

a subtask {first subtasks) and (b) later performances of the same 
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subtask (later subtasks) . First and later subtasks were counted 
separately, because the former measure provided an indication of 
these units' subtask proficiency prior to the RCVTP. 

Results and Di scussion 

Wilcoxin signed-rank tests for matched pairs were computed 
on the initial and final performance rating data. 

Data for the initial and final performance measures^ As 
shown in Table 3, a total of 359 subtasks had at least two 
ratings. Based on these frequencies, the percentage of subtasks 
with train to sustain ratings increased from 61.8% to 78.6 %. 
Furthermore, when the subtasks with ratings of train to improve 
were compared to the subtasks with ratings of train to sustain, a 
significant majority (74.6%) of them were train to sustain. 

These units seemingly thus became more proficient in these 
subtasks as their training progressed. This observation was 
confirmed by the data analyses as significantly more subtasks 
were included in the improve/sustain category as compared to the 
subtasks included in the improve /improve category. Also, 
significantly more subtasks were included in the sustain/sustain 
category than in the sustain/improve category. (See Table 3 for 
the results of the statistical tests.) 

Data analyses also revealed that the armor and mech/scout 
platoons were more likely to improve than were the armor 
companies. In subtasks rated train to sustain, for example, the 
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Table 3 



Initial and 


Final 


Sub task Rating 


Counts by Unit Type 




Type of unit 


u 


Improve - 
sustain 


Sustain- 

sustain 


Improve - 
improve 


Sustain 

improve 


Armor 

Platoons 


17 


46 


90 


20 


12 


Mech/scout 

Platoons 


11 


20 


45 


6 


6 


Armored 

Companies 


10 


25 


56 


20 


13 


Total 


38 


91 


191 


46 


31 



Table 4 . 

Wilcoxon Signed -Rank Tests of Numbers of Subtasks in Categories 
Based on First and Last 0/C Ratings 



Type of Unit 


a * 




I 


Zl 


Improve/Sustain versus No Change^ 




All units 


38 


31 


115.50 


2.60*** 


Armor platoons 


17 


14 


15.00 


2.35** 


Mech/scout platoons 


11 


8 


6.00 


1.68* 


Armor companies 


10 


9 


19.50 


.35 




Sustain/ Improve 


versus 


No Change 




All units 


38 


34 


8.00 


4 . 95*** 


Armor platoons 


17 


16 


0.00 


3.52*** 


Mech/scout platoons 


11 


8 


2.00 


2.24** 


Armor companies 


10 


10 


1.00 


2.70*** 




Improve/Sustain versus Sustain/ Improve 


All units 


38 


32 


60.00 


3.81*** 


Armor platoons 


17 


15 


8.00 


2.95*** 


Mech/scout platoons 


11 


9 


7.00 


1.83* 


Armor companies 


10 


8 


7.00 


1.54 


• Number of units. 


Number of non- 


zero dif ferences .= No 


Change 



category includes both sustain/sustain and improve /improve sets 
of ratings. 

< .10. **u < .05. < .01. 
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armor companies' gain from first, to last rating (60.5% to 71.1%) 
was about half that of the armor and mech/scout platoons' gain 
(6? .4% to 82.0%) . 

Data for thp first and later subtasks bv u nit tvoe. As 
shown in Figure 1, the sustain percentage for these units' first 
performance of a subtask varied around 59.9%, with only a modest 
increase across successive tables. This trend suggested that some 
small generalized transfer effects helped offset an expected 
decrease in performance when the more difficult subtasks were 
initially encountered in later training tables. 

Their performances for later occurrences of a subtask 
increased substantially for their third, fourth, and seventh 
training tables. Small increases were found for their fifth and 
sixth training tables. These findings provide further evidence 
that these units became more proficient as the result of practice 
afforded by the RCVTP. Correspondingly then, the trends found 
for Assessment A were not just a function of the units learning 
to use SIMNET. 

Summary nf Assessment B. This assessment provided 
additional evidence for the training effectiveness of the RCVTP. 
Hence, this assessment has thus provided some answers to the 
questions posed from Assessment A. Questions, however, still 
remained about the participants' feelings toward this training 
program. 
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Figure 1. Percent of "first" and "later" subtasks with "train to 
sustain" by successive training tables . 

Assessment C: The Units' Questionnaire Responses 

This section is based on the formative assessment of the 
RCVTP as conducted by the instiructional design team. 

Method 

Participants . Questionnaire data were collected on 280 
participants from the developmental trials. This sample thus 
included participants from Assessments A and B. 

I 

Two hundred thirty-nine of these particpants were unit 
leaders, e.g., company commanders, platoon leaders, and tank 
commanders. These participants came from: (a) 19 armor 

companies, (b) 12 armor platoons, (c) 3 scout platoons, and (d) 3 
mechanized infantry platoons. And they included 206 ARNG and 74 
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active soldiers, who were from armor companies. 

Instrument . The instrument used for this assessment was a 
40-item Likert-scaled self-report questionnaire. The items 
included the participants' perceptions regarding: (a) their 

proficiency before the training; (b) their proficiency after the 
training; (c) the training benefits of the RCVTP as compared to 
other SIMNET training experience; and (d) the various aspects of 
the RCVTP. The scale for these items ranged from seven as the 
most positive to one as the least positive with four as a neutral 
point. The participants were also given the opportunity to 
provide reasons for their answers to the different questions. 

Data collection procedure^ All ethical guidelines 
prescribed by ARI and the American Psychological Association were 
followed when the questionnaire was administered at the end of 
the participants' training. 

Results and Discussion 

Data regarding levels of prof iciencv.t. A significant 
difference was found regarding participants' estimates of pre- 
and post-RCVTP training proficiency levels, L (238^) =19 . 55 , p < 
.001. The participants, regardless of unit type, claimed to be 
more proficient after training (M=5.44) than they were before 
training (M=3.95). 

Also, comparisons between unit types on a difference score 
(before training estimates minus post-training estimates) 

^ Number of participants is fewer than 280 because only the 
data from unit leaders were analyzed. 
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revealed a "significant interaction" between unit types and 
proficiency levels. As shown in Figure 2, leaders from the 
reserve company units indicated that they improved more than did 
their active counterparts. This interaction seemingly occurred 
because ARNG unit leaders claimed to be at a lower level of 
initial proficiency than did their active counterparts. This 
training program thus raised the confidence levels of the ARNG 
armor company leaders to the claimed post -training levels of the 
active company leaders. 

rf^aarding the pa r ticipants ' perceDtions of . the RCVTP ^ 
Participants, regardless of their designation, believed that 
improvement in their unit's performance was a function of the 
RCVTP. Means of 5.43 and 5.54 were found for the items dealing, 
respectively, with improvement as a function of the time in the 
simulators and the AARs. They, furthermore, indicated that they 
became more proficient after this training than after their other 
SIMNET training experiences with a mean of 5.66 for this item. 

The questionnaire data also provided some insights into 
the participants' feelings about components of the RCVTP' s 
instructional design. One, they felt that discovery learning did 
take place with mean scores of 5.70 and 5.75 for items dealing 
with their AAR comments helping them to improve on the platoon 
training tables and the company training tables, respectively. 
Two, these training tables were viewed as becoming more difficult 
as the training progressed. A mean of 5.55 was found for the item 



dealing with this issue. 
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Figure 2 Means of proficiency estimates by unit leaders from 
reseirve components and active armor companies. 

RampiP of participants' c omments. As indicated by the 
quantitative data, the participants’ comments tended to be 
positive. They were most appreciative of the training 
opportunity. One participant wrote: 

"We have no opportunity for company levt maneuvering at 
home station and the opportunity for that here is 
priceless . " 

Another participant stated: 

"I believe that these missions with the simulators are the 
most effective training that I have had... I hope that we 
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receive more SIMNET training in the future." 

Their comments also indicated that improvement was a function of 
the RCVTP. A participant claimed that during the RCVTP the unit 
was less likely to get lost in the SIMNET terrain than during 
their previous SIMNET training. Another wrote: 

" (RCVTP was) a very valuable training program. Ability to 
do a lot of movement in a short time." 

There were a few negative comments, however. A participant from 
an armor company noted: 

"When a unit first arrives I believe that we went from crawl 
to run (basic to complex) , instead of crawl, walk, run. It 
made it (the RCVTP) a little bit more difficult than (it) 
should be . " 

And a unit leader from an armor company wrote that the 0/Cs 
should use their visual aids more during the AARs. These 
suggestions could help make the RCVTP an ever better 
instructional program. 

Summary of Assessment C. This program's effectiveness has 
thus beon established from the perspective of ARNG users. ARNG 
units seemingly then would like to utilize this program for their 
future collective tactical training. This assessment also 
provided insights into the reasons for the participants' positive 
perceptions of this program and possible ways of improving the 
RCVTP. 
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Summary and Conclusions 

Data from the different methods and sources indicate that 
the units developed their collective tactical skills across the 
^]^ainin9 period. This evaluation has thus demonstrated the 
RCVTP's instructional value for helping tactical units to become 
more proficient. 

Valnp» of Multimethod-Multisource Approach 

This evaluation has also further demonstrated the value of 
employing a multimethod-multisource evaluation strategy for 
conducting naturalistic evaluations of high-technology based 
training systems. As stated, each method might have provided 
problematic data. The observational data, for example, were 
limited by their small sample size and the exclusion of defensive 
tables. Areas of agreement among assessments thus provided more 
valid conclusions than any single assessment method would have 
provided . 

Also, each assessment yielded insights into this training 
situation from complementary perspectives. As indicated, the 
observational data reflected the perspective of evaluators who 
were independent of the instructional design and training 
processes; however, they were not subject-matter experts. The 
instructors were subject-matter experts but were part of the 
instructional process. The questionnaires tapped the users' 
perspective. Taken then from these different methods and sources, 
the evidence for RCVTP's effectiveness becomes more compelling. 

Each assessment also provided compli^T\entary insights into 
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the training situation. The observational data indicated that the 
participants were apparently able to attend to the cues inherent 
in the RCVTP tables without too much reliance on instructor 
prompting. Also, insights into users' feelings about their 
learning process were provided by the questionnaire data. Product 
and (some) process outcomes associated with the RCVTP were thus 
obtained in this evaluation; while other SIMNET evaluations 
(e.g., Shlechter, Bessemer, & Kolosh, 1991) have only obtained 
product outcomes. 

Obtaining process outcomes provided these evaluators with 
further confidence in this evaluation's internal validity. 

Unlike Shlechter et al's (1991) SIMNET evaluation, this 
evaluation demonstrated that the cited improvements associated 
with the training system were not an artifact of additional 
instructor prompting. 

Also, quantitative and qualitative data were obtained in 
this evaluation with the latter providing meaning to the former. 
As indicated, the participants ' comments provided insights into 
their reasons for wanting to use this program. Adequate criterion 
measures were thus seemingly sampled in this evaluation. 

Problems with this evaluation 

These researchers had trouble collecting some processes 
outcomes, especially those assoicated with the AARS. That is, we 
are not able to assess the participants' ability to articulate 
the reasons for their actions. Perhaps, our problems with the AAR 
data might have been the result of trying to collect too much 
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data. An observer, for example, claimed that the AARs went too 
fast for him to record all the requested information. Researchers 
must then not overwhelm their data collectors by having them 
collect too much data. 

We should have also more carefully matched "data source" 
with the method. For instance, some of these observers professed 
to having problems judging the units' perfoinmance. One observer 
maintained that the RCVTP instructional personnel had to 
continually help him with his ratings. These observers, perhaps, 
were best suited for collecting the more objective data while 
performance judgments should have been left to the experts. 
Closing statement 

This investigation has further delineated the advantages of 
and problems with conducting multimethod-multisource evaluations 
of high-technology based training systems. As discussed, the use 
of multiple methods and sources has provided us with a better 
understanding of the RCVTP' s effectiveness than could be provided 
by any single method and source. Also, this research strategy 
provides a viable approach to evaluating a high-technology based 
training system in a non-controlled context without the 
possibility of obtaining either baseline or transfer measures of 
performance. As previously stated, such evaluations may become 
more prevalent as the resources to conduct more controlled 
evaluations become more scarce. 

These authors must finally address the problem of 
information regarding evaluation techniques. As indicated, it 
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was nearly impossible to find any information on this topic in 
the different bibliographical data bases (DITIC, ERIC, or 
PSYCHLIT) . Perhaps, a common source delineating the lessons 
learned from different evaluation techniques is needed. 
Otheirwise, valuable research time may be lost as researchers are 
continually- "re-inventing the wheel." 
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